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I 

A small subset of the large pentatricopeptide repeat (PPR) 
protein family in higher plants contain a C-terminal small IVlutS- 
related (SMR) domain. Although few in number, they figure 
prominently in the chloroplast biogenesis and retrograde 
signaling literature due to their striking mutant phenotypes. 
In this review, we summarize current knowledge of PPR- 
SMR proteins focusing on Arabidopsis and maize proteomic 
and mutant studies. We also examine their occurrence in 
other organisms and have determined by phylogenetic 
analysis that, while they are limited to species that contain 
chloroplasts, their presence in algae and early branching land 
plant lineages indicates that the coupling of PPR motifs and 
an SIVIR domain into a single protein occurred early in the 
evolution of the Viridiplantae clade. In addition, we discuss 
their possible function and have examined conservation 
between SIVIR domains from Arabidopsis PPR proteins with 
those from other species that have been shown to possess 
endonucleolytic activity. 



Introduction 

The pentatricopeptide repeat (PPR) protein family was serendipi- 
tously discovered as a result of computational analysis of the then 
incomplete Arabidopsis thaliana genome sequence for gene prod- 
ucts likely to be targeted to plastids and mitochondria.' While 
subsequent analysis revealed that these proteins are ubiquitous in 
eukaryotes, they were found to be particularly prevalent in terres- 
trial plants (e.g., 450 members in Arabidopsis).^"* Since their dis- 
covery, a plethora of genetic, molecular, and biochemical evidence 
suggests that PPR proteins bind RNA in a highly specific manner 
and facilitate events such as cleavage, editing, splicing, turnover, 
and translation of their target organellar transcript(s).^'^''' 

PPR proteins are defined by the presence of tandem repeats of 
degenerate 31-36 amino acid motifs and can be classified based 
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on motif structure and the presence of additional C-terminal 
domains.* The P subfamily consists of PPR proteins with ortho- 
dox 35 amino acid PPR (P) motifs, while the PLS subfamily 
includes PPR proteins with additional long (L) or short (S) motif 
variants and derive their name from their characteristic tandem 
arrays of P-L-S motif triplets. PLS PPR proteins are further clas- 
sified, based on their C-terminal domain(s), into the E, E+, and 
DYW subgroups. In addition, while not yet formally recognized 
as subgroups, P-class PPR proteins can also be categorized by the 
presence of additional domains, such as the small MutS-related 
(SMR) domain.' 

Searching the Arabidopsis genome reveals that eight proteins 
contain both PPR motifs and an SMR domain (Fig. 1). Despite 
the relatively small size of this subgroup, there has been sus- 
tained interest in this type of PPR protein since the revelation 
that genomes uncoupled 1 (GUNI) encodes a PPR protein with a 
C-terminal SMR domain.'' GUNI is a central regulator of plastid 
retrograde signaling, where the developmental and/or functional 
state of the plastid exerts control on the expression of nuclear 
genes encoding plastid-localized proteins, such as photosynthesis- 
associated nuclear genes (PhANGs). Despite this important role, 
we still do not understand the precise molecular mechanisms of 
GUNI and other proteins with similar domain architecture and 
what specific role, if any, the SMR domain plays in their func- 
tion. This review will focus on this small but important group 
of PPR proteins that contain an SMR domain by summarizing 
our current knowledge from studies performed in higher plants, 
examining their presence in other organisms and discussing the 
possible role of the SMR domain. 

The SMR Domain -What Is It? 

MutS proteins are key enzymes involved in repair of mismatched 
DNA bases produced during biological processes such as DNA 
replication.* The SMR domain was originally identified in the 
C-terminal region of the MutS2 protein from the cyanobacterium 
Synechocystis? MutS2 proteins suppress homologous recombination 
by endonucleolytic digestion of branched DNA structures formed 
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Figure 1. Proteins containing an SMR domain in the model plant 
Arabidopsis thaliana. A non-redundant set of 12 proteins was identi- 
fied by searching the Universal Protein knowledgebase" (UniProt; 
www.uniprot.org) for Arabidopsis proteins that contain the InterPro''^ 
domain IPR002625 (Smr protein/MutS2 C-terminal domain). Proteins are 
denoted by their corresponding Arabidopsis Genome Identifier (AGI; 
ATXGXXXXX) and, if applicable, followed by their common name (e.g., 
GUN1). Protein domain structure is shown alongside each AGI to dem- 
onstrate presence and location of the pentatricopeptide repeat (PPR), 
small MutS-related (SMR), domain of unknown function (DUF) 1771, 
and IVlutS domains. The schematics of protein domain structure were 
created by combining TPRpred"' to predict PPR domains (those with P > 
0.01 were excluded), InterProScan™ to identify other domains and DOG 
1.0^' for visualization of their respective positions. 



early in this process and the nuclease activity of these proteins has 
been specifically attributed to the SMR domain.'" Moreover, while 
not all organisms have MutS2 orthologs, proteins containing SMR 
domains are widespread in bacterial and eukaryotic species.''" In 
a recent review, Fukui and Kuramitsu" introduced a classification 
system fiar proteins containing SMR domains. Subfamily 1 con- 
sists of MutS2 orthologs and is restricted to proteins from bacterial 
and plant species, subfamily 2 includes proteins with domains in 
addition to the SMR domain and are usually found only in eukary- 
otes, while subfamily 3 comprises "stand-alone" SMR domains 
and comprises proteins from both prokaryotes and eukaryotes." 

In Arabidopsis, 12 proteins are found to contain an SMR 
domain (Fig. 1). As can be seen from the domain structure 
shown, AT1G65070 belongs to subfamily 1 (MutS2-like) while 
the remaining 11 proteins are classified as subfamily 2 SMR pro- 
teins. Of the 11 subfamily 2 SMR proteins, eight contain PPR 
motifs. Consistent with nomenclature used recently,'" we will 
refer to proteins with this domain architecture as PPR-SMR 
proteins (PPR-SMRs). When performing BLAST searches using 
PPR-SMRs, other plant PPRs are identified with C-terminal 
domains that potentially represent a highly degenerate SMR 
domain (e.g., AT3G18110/EMB1270). However, their relation- 
ship to bona fide PPR-SMRs remains to be clarified and these 
will not be discussed further. 

PPR-SMRs-What Do We Know So Far? 

Characterization of PPR-SMRs has focused on higher plant 
models, such as Arabidopsis and maize. Data collected thus far 



is derived from a combination of proteomic and mutant analyses 
and is summarized in Table 1 for the Arabidopsis PPR-SMRs 
and their maize orthologs. 

PPR-SMRs in higher plants are localized to both mitochon- 
dria and plastids. Of the eight Arabidopsis PPR-SMRs, three have 
either been found (AT1G79490'') or are predicted (AT1G74750 
and AT1G18900'*) to be localized to mitochondria. The corre- 
sponding maize orthologs are also predicted to be localized to the 
mitochondria. For the confirmed mitochondrial-localized PPR- 
SMR AT1G79490, it is also known that mutant lines have an 
embryo lethal phenotype (EMB2217), with developmental arrest 
occurring at the globular stage." Moreover, the corresponding 
gene has been reported to have transient, germination-specific 
expression at early stages of Arabidopsis seed germination, con- 
sistent with an important role in early plant development." 
However, this is the extent of the information available from the 
current literature for Arabidopsis mitochondrial PPR-SMRs. 

The five remaining PPR-SMRs all have experimental evidence 
indicating that they localize to the other endosymbiotically 
derived organelle, the plastid (Table 1). Extensive proteomic data 
are available for three of these (pTAC2, SVR7, and AT5G46580) 
and for the corresponding maize orthologs (Zm-pTAC2, ATP4, 
and PPR53). Specifically, these proteins were found in proteomic 
studies of Arabidopsis chloroplast stromal megadalton com- 
plexes"" as well as Arabidopsis'^ and maize'" plastid nucleoids. In 
addition, pTAC2 and an ortholog of AT5G46580 were found 
in preparations of plastid transcriptionally active chromosomes 
(pTAC) from Arabidopsis" and spinach,^" respectively. A minor 
fraction of AT5G46580 was also found in a plastid envelope- 
enriched sample.^' This places the AT5G46580 protein in three 
different compartments of the plastid: the thylakoid mem- 
branes (which nucleoids/TAC are associated with), the stroma, 
and the envelope. While it may be that this protein localizes to 
all of these plastid sub-compartments, it is possible that it may 
only be loosely associated with the nucleoids and easily removed 
during preparation of various sub-compartmental plastid frac- 
tions. Furthermore, while no experimental data exists for the 
Arabidopsis protein, the maize ortholog of AT2G17033, PPR- 
SMR4, has also recently been detected in nucleoid-enriched 
fractions.'" Finally, despite its central role in plastid retrograde 
signaling pathways, proteomic data for GUNl is absent and its 
plastid localization is based on microscopic analysis of transiently 
expressed fluorescent protein fusions.^ GUNl-yellow fluorescent 
protein (YFP) was shown to accumulate in chloroplasts in a 
punctate pattern overlapping the patterns of pTAC2-cyan fluo- 
rescent protein, indicating co-localization of GUNl and pTAC2 
in actively transcribed sites of plastid nucleoids.^ However, given 
that more recent studies suggest that processes such as mRNA 
cleavage, splicing, and editing, as well as ribosome assembly, take 
place in association with the nucleoids,'" it is not clear whether 
GUNl is specifically bound to plastid DNA and/or RNA in vivo. 

Apart from the lack of GUNl protein detected, the fact that 
other plastid PPR-SMRs are routinely detected in plastid pro- 
teomic studies indicates they are more abundant than other PPR 
proteins, which are generally considered to be low abundance pro- 
teins. For example, SVR7 was the only PPR protein (out of 450) 
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that could be reliably detected in whole leaf protein samples, 
where its abundance was found to decrease with increasing leaf 
age. In addition, recent proteomic analyses have allowed relative 
quantitation of protein abundance to be estimated using spectral 
counts derived from mass spectrometry (MS) analysis. "'"'^•^'•^^ 
This approach is based on the observation that the number of 
MS/MS acquisitions of peptides coming from a protein shows 
a positive correlation to the relative concentration of the protein 
in the sample. These available data sets have allowed us to assess 
PPR-SMR protein abundance relative to other PPR proteins as 
summarized in Table 2. In general, in both Arabidopsis and 
maize, PPR-SMRs dominate the protein mass that can be attrib- 
uted to PPR proteins, contributing 26-53% of the total PPR 
protein mass in samples ranging from total leaf protein extracts 
to purified nucleoids. While the exact reason for the high abun- 
dance of these proteins remains to be determined, we speculate 
that this could reflect binding to multiple targets (e.g., ATP4, see 
below) and/or that their targets are highly abundant (e.g., rRNA, 
see SVR7 below). More specifically, in maize, Zm-pTAC2 was 
found to be the most abundant PPR-SMR in all samples analyzed 
(13-34% of total PPR protein mass), with PPR53 the next most 
abundant PPR-SMR (6.5-21%). In Arabidopsis, pTAC2 was also 
the most abundant PPR-SMR in nucleoids (34%) but in other 
samples (total leaf and high molecular weight stromal fractions) 
SVR7 was consistently found to be the dominant PPR-SMR 
protein present (24—34%). Interestingly, the maize ortholog of 
SVR7, ATP4, while detected in all samples, was always found 
at lower levels (1-3% of total PPR protein mass) indicating dif- 
ferent expression levels of these orthologs in the monocot and 
dicot lineages. It remains to be determined whether this differ- 
ence underlies their reported functional divergence (see below). 

PPR-SMR mutant analysis reveals diverse phenotypes 
and putative targets. Genetic approaches show that, despite 
their similarity in protein architecture, the gross and molecular 
mutant phenotypes for plastid PPR-SMRs differ dramatically. 
For example, at the level of plant vitality and growth, Arabidopsis 
mutant phenotypes range from seedling lethal {ptac2^'^) to mod- 
erately slower growth and paler leaves {svr/^'''^''^) to a normal, wild- 
type-like phenotype {gunr) under normal growth conditions. 
Similarly, this is the case for maize PPR-SMR protein orthologs 
with seedling phenotypes also ranging from wild-type-like {ppr- 
smr-4) to very pale yellow-green {Zm-ptac2 and ppr53; Table 1). 

"Genomes uncoupled" (GUN) refers to the mutant phenotype 
where nuclear and plastid gene expression is uncoupled. Twenty 
years ago, gun mutants were identified from a mutagenized col- 
lection of plants containing the GUS reporter gene driven by 
the promoter of a gene encoding a light harvesting complex pro- 
tein, LHCB1.2.~^ Mutants impaired in plastid-to-nuclear signal- 
ing were identified by screening seedlings in the presence of the 
carotenoid biosynthesis inhibitor, norflurazon (NF).^^ The initial 
publication from this screen identified three gun mutants {gunl, 
gun2, gun3), in which LHCB1.2 expression was not repressed 
after NF treatment, compared with the control line. Since then, 
these and other gun mutants have been characterized, but it was 
not until 2007 that GUNl was found to be a plastid-localized 
PPR-SMR protein.^ As well as the classical "genomes uncoupled" 



phenotype, characterized by the inability to repress PhANG gene 
expression when plastid function is inhibited, gunl mutants are 
also retarded in their ability to de-etiolate, indicating that GUNl 
plays a role in the transition from heterotrophic to photoautotro- 
phic growth. Moreover, gunl is unique among the gun mutants 
in that impaired repression of PhANGs occurs when the seed- 
lings are subjected to treatment with either NF or plastid transla- 
tion inhibitors,^'^' such as lincomycin. This indicates that GUNl 
is required for a retrograde signaling pathway involving plastid 
gene expression as well as another pathway involving carotenoid 
biosynthesis. For detailed information and further discussions on 
GUNl and plastid retrograde signaling, we direct the reader to 
recent reviews in this area.'""^^ 

PTAC2 was identified as one of 18 novel components of 
plastid transcriptionally active chromosomes (pTACs).''' The 
ptac2 mutant is only viable when an exogenous carbon source 
is available and, when this is provided, it develops yellow coty- 
ledons and pale green primary leaves, but is unable to proceed 
to reproductive growth. Examination of the ultrastructure of 
the plastids in the ptac2 mutant indicates that plastid devel- 
opment is severely impaired. Analysis of transcript abundance 
of plastome-encoded genes suggests an involvement of pTAC2 
in plastid-encoded-polymerase (PEP) -dependent transcription 
and processing of chloroplast RNAs as the ptac2 mutant plants 
showed a strongly reduced accumulation of transcripts gener- 
ated by PER"''" 

The svr7 mutant was identified during a screen for sup- 
pressors of var2 variegation.^' VAR2 encodes a plastid prote- 
ase (FtsH), and in its absence, leaves develop a characteristic 
variegated pattern, including white sectors where chloroplasts 
fail to develop.^'' However, the svr7lvar2 double mutant lacks 
these white sectors. Processing of 23S, 16S, and 4.5S rRNA 
is perturbed in svr7P In addition, a specific reduction in the 
accumulation of the ATP synthase subunits A, B, E, and F and 
reduced ribosome association of atpB/E and rbcL mRNAs in 
the .ffrZ mutant has also been observed, indicating that SVR7 is 
involved in translational activation of these transcripts.^'' Given 
its similarity to GUNl, the authors also investigated if the svr7 
mutant displays a "gun" phenotype by testing PhANG responses 
upon treatment with NF. These experiments indicated that the 
svr7 mutant is, like wild-type, able to repress PhANG expres- 
sion upon inhibition of chloroplast function and, thus, does not 
display a "gun" phenotype.^'' 

ATP4, the maize ortholog of SVR7, has also been character- 
ized.'^ RNA co-immunoprecipitation assays identified the dicis- 
tronic plastid atpB/E mRNA as a ligand for ATP4 in vivo. As for 
the svr7 mutant, polysome analysis indicates that translation of 
the atpB/E transcript is perturbed in the atp4 mutant. However, 
atp4 also shows reduced translation of the atpA transcript and 
exhibits a more extreme phenotype compared with svr7 with 
apparent loss of the plastid ATP synthase complex. Also, in con- 
trast to svr7, the accumulation of processed atpF and psaj tran- 
scripts'^ and the stabilization of dicistronic rpll6-rpll4 RNAs'"' is 
affected in the atp4 mutant. Thus, the phenotypes of atp4 and 
svr7 mutants suggest that the functions of these orthologs are 
not strictly conserved. Furthermore, while over-accumulation of 
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Table 2. The relative abundance of PPR-SMR proteins in different Arabidopsis and maize protein samples based on normalized adjusted spectral 
counts as an estimate of protein abundance 



Reference Protein fraction description 


%of total 

No. PPR No. PPR-SMR 
No. proteins protein mass 
i^Lytui^A proteins proteins ...... 

Identified attributed to 
identified identified 

rrn prOlcllla 


% of total PPR protein 
mass attributed 

to PPR-SiVIR proteins 


24 


Total leaf protein 
(Arabidopsis, Col-0, 
rosette leaves) 


3424 17 3 0.05 


48% 

32%SVR7, 14%pTAC2, 
2% AT5G46580 


17 


Total leaf protein 
(Arabidopsis, no information) 


815 9 3 0.02 


50% 

34% SVR7 14% AT5G46580, 
2% pTAC2 


16 


Stromal fraction: low molecular 
weight (Arabidopsis, Col-0, 

rosette leaves, 55 d old) 


398 0 0 0 


0 


16 


Stromal fraction: high molecular 
weight A (Arabidopsis, Col-0, 

rosette leaves, 55 d old) 


293 9 3 0.46 


33% 

24% SVR7, 4.5% pTAC2, 

H.D /O rt 1 jU^DjoU 


16 


Stromal fraction: high molecular 
weight B (Arabidopsis, Col-0, 

rosette leaves, 55 d old) 


230 6 3 0.47 


53% 

28%SVR7, 19% pTAC2, 
6% AT5G46580 


1/ 


Nucleoids 
(Arabidopsis, Col-0, 
young seedlings) 


1026 26 3 1.04 


47% 

34% pTAC2, 12% AT5G46580, 
1%SVR7 


1"7 
1/ 


Proplastids 
imaize, d/j, tnira leat oiaue or 
8-9 d old seedlings) 


2242 32 3 0.67 


48% 

24% Zm-pTAC2, 21% PPR53, 3% ATP4 


1 Q 

lo 


Proplastids 
(maize, third leaf blade of 
8-9 d old seedlings) 


1717 17 3 0.41 


53% 

34% Zm-pTAC2, 17% PPR53, 2% ATP4 


23 


Chloroplasts 

(maize, WT-T43, third leaf blade 
of 12-14 d old seedlings) 


1428 5 0 0.002 


0 


18 


Nucleoids - average from 
base-tip-young samples (maize) 


1092 63 4 4.65 


29% 

16%Zm-pTAC2, 11% PPR53, 
1.5% ATP4, 0.5% PPR-SMR4 


18 


Nucleoids, leaf base 
(maize, third leaf blade of 
8-9 d old seedlings) 


678 46 4 4.89 


27% 

13% Zm-pTAC2, 12% PPR53, 
1%ATP4, 1%PPR-SMR4 


18 


Nucleoids, leaf tip 
(maize, third leaf blade of 
8-9 d old seedlings) 


710 35 3 2.68 


26% 

18% Zm-pTAC2, 6.5% PPR53, 
1.5% ATP4 


Nucleoids, young leaves 32% 

18 (maize, leaf blades of 827 55 4 6.38 18% Zm-pTAC2, 12% PPR53, 

7-8 d old seedlings) 1.5% ATP4, 0.5% PPR-SMR4 

For quantitation of protein mass, each protein accession is scored for total MS/IVIS spectral counts (SPC), unique SPC (uniquely matching to an acces- 
sion), and adjusted SPC (adjSPC). AdjSPC is the sum of unique SPCs and SPCs from shared peptides across accessions with SPC distributed in proportion 
to their unique SPC. The normalized adjSPC (NadjSPC) for each protein is calculated through division of adjSPC by the sum of all adjSPC values for the 
proteins from the sample (e.g., per gel lane or protein extract). Thus, NadjSPC provides a relative protein abundance measure by mass. For example, a 
protein with NadjSPC = 0.01 contributes approximately 1% of the protein mass of the analyzed sample. NadjSPC values were obtained from the publi- 
cations indicated and used to calculate the relative abundance of PPR and PPR-SMR proteins. 
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plastid rRNA precursors is observed in the atp4 mutant, as seen 
for svr7, the authors note that these differences are hkely to be 
secondary as they are not specific to atp4 and are also observed 
in other mutants impaired in plastid gene expression and/or ATP 
synthase activity.'^ 



Figure 2. Bayesian phylogenetic tree of PPR-SMR protein sequences 
from a range of different species. Sequences of PPR-SMR proteins were 
obtained from BLAST searches and InterPro domain searches (IPR002625 
and IPR002885) and aligned using iVlUSCLE.=^ A phylogenetic tree was 
constructed using MrBayes version 3.2.1" which employs Markov Chain 
Monte Carlo (MCMC) sampling to approximate the posterior probabili- 
ties of phylogenies'" (shown above the branches). MrBayes 3.2.1 was run 
in parallel on the Fornax supercomputer (located at iVEC@UWA) utilizing 
the BEAGLE library" with a mixed model of molecular evolution (deter- 
mined using jModelTest^'"), utilizing 12 chains for 50 million generations 
and trees sampled every 1000 generations. All runs reached a plateau in 
likelihood score, which was indicated by the standard deviation of split 
frequencies (0.0015), and the potential scale reduction factor was close 
to one, indicating the MCMC chains converged. Sequences are color 
shaded based on their lineage as indicated. 



PPR-SMRs— Which Organisms Have Them? 

All of the studies undertaken to date that have identified PPR- 
SMR proteins or investigated their function have done so using 
higher plant species and have focused on single Arabidopsis or 
maize proteins, and there is little information on the evolution- 
ary relationships between PPR-SMR proteins within or across 
species. What has not been previously examined is the extent to 
which this protein architecture, where PPR and SMR domains 
are coupled into a single protein, is present in other organisms. 
As whole genome sequences become accessible through the recent 
increases in sequencing data available for organisms represent- 
ing diverse lineages, this provides an opportunity to examine the 
presence of PPR-SMR proteins in a wide range of organisms to 
determine their origins and diversification. 

PPR-SMR protein sequences were collated in two ways — 
by searching for proteins containing both PPR and SMR 
domains in Uniprot using the InterPro identifiers IPR002625 
and IPR002885 and by BLAST using the SMR domains of the 
Arabidopsis members of this PPR subgroup. Sequences obtained 
were manually curated so major clades were represented by 
organisms for which complete genome sequences were available, 
where possible, and truncated and redundant sequences were 
removed. It is already known that proteins containing SMR 
domains are found in both prokaryotic and eukaryotic organ- 
isms' while PPR motifs are confined to eukaryotes.' Thus, it is 
not surprising that PPR-SMR proteins are essentially only found 
in eukaryotic organisms and, interestingly, largely confined to 
the Viridiplantae clade. One major exception to this is sequences 
found for PPR-SMR proteins in two strains of Legionella long- 
beachaeF However, given the paucity of PPR proteins encoded 
in other bacterial species it is likely that these sequences are rem- 
nants of a horizontal gene transfer event, as has been previously 
suggested to explain PPR genes identified in an isolated number 
of bacterial species.' 

PPR-SMR proteins were also found in heterokont species 
(brown algae, diatoms). Until recently, heterokont chloroplasts 
were thought to be derived from the secondary endosymbiosis 
of an ancestral red algae by a eukaryotic host — the "chromal- 
veolate hypothesis."'** However, we were unable to find PPR- 
SMR sequences in red algal genomes {Chondrus crispus and 
Cyanidioschyzon merolae). This suggests that the PPR-SMR 
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Figure 3. SMR domain alignment to assess amino acid sequence conservation. The SIVIR domains of the eight Arabidopsis PPR-SMR proteins were 
aligned with SMR domains from proteins that have been experimentally demonstrated to have endonucleolytic activity.'"''" The sequences are 
denoted by the SMR subfamily type (1_, 2_, or 3_) followed by the AGI (for Arabidopsis proteins) or alternative identifier (Tt_MutS2 - Thermus ther- 
mophilus MutS2 protein; Hs„B3BP - Homo sapiens BCL3 binding protein; Ld_CSBP - Leishmania donovani cycling sequence binding protein; Ec_YdaL 
- Escherichia coli YdaL protein), and the length of the SMR domain (e.g., /I -93). Alignment was performed using MUSCLE" and visualized using Jalview 
(www.jalview.org) with ClustalX coloring by conservation. The positions of previously described conserved regions" are indicated on the alignment: 
the LDXH motif present in subfamily 2 SMR domains and the centrally located HGXG/TGXG (subfamilies 1 and 3/subfamily 2) are bounded by the red 
and blue boxes, respectively. 



proteins found in heterokonts may be derived from an ancestral 
endosymbiont from the green algal lineage, supporting the more 
recent hypothesis that endosymbiosis of a green algae into the 
ancestral host cell preceded the engulfment of a red algae." ''" 

Our Bayesian phylogenetic analysis (Fig. 2) also reveals that 
orthologs of all Arabidopsis proteins are present in species rep- 
resenting the major angiosperm clades, including both dicots 
and monocots. However, the putative mitochondrial PPR-SMRs 
AT1G18900 and AT1G74750 are represented by a single ortho- 
log in most other flowering plants, with Arabidopsis lyrata the 
only exception, indicating that a recent gene duplication event 
accounts for the extra protein present in Arabidopsis species. Five 
PPR-SMRs were identified in the lycophyte and bryophyte mod- 
els, Selaginella moellendorfii and Physcomitrella patens, respectively. 
Homologs of GUNl, pTAC2, SVR7/AT5G46580, EMB2217, 
and AT1G18900/AT1G74750 were found in Selaginella while 
homologs of GUNl, pTAC2, AT2G17033, and AT1G18900/ 
AT1G74750 were found in Physcomitrella. This suggests that the 
SVR7/AT5G46580 and EMB2217 clades arose when tracheo- 
phytes evolved. However, the discovery of PPR-SMR proteins in 
chlorophytes {Micromonas, Chlorella, and Ostreococcus) suggests 
that this type of PPR protein emerged early in the evolution of 
the PPR protein family in chloroplast-containing lineages. 

The SMR Domain of PPR-SMRs— 
What Is Its Function? 

The function of the SMR domain in PPR-SMR proteins has not 
yet been comprehensively explored. Currently, the only exami- 
nation of the specific role of the SMR domain that has been 
published has come from the characterization of GUNl, which 
reported DNA-binding activity of the SMR domain, using a 



non-specific substrate (calf thymus DNA).^ However, studies 
of SMR domain-containing proteins in other organisms have 
focused on its specific role as a nuclease and evidence now exists 
for endonucleolytic activity in members representing all three 
subfamilies of SMR-domain-containing proteins. '"'"'^'"^^ 

Functional characterization of the C-terminal domain of the 
human BCL3 binding protein (a subfamily 2 SMR domain) 
provided the first evidence for endonuclease activity of the 
SMR domain.'*' The recombinant domain was found to non- 
specifically incise a supercoiled plasmid DNA to generate an 
open circular form of the plasmid, demonstrating the nicking 
endonuclease activity of the protein. A specific role for the SMR 
domain of this protein in binding DNA was later demonstrated."** 
Nuclease and DNA-binding activity was also confirmed for the 
subfamily 1 SMR domain of the MutS2 protein from Thermus 
thermophilus^'^ and for a subfamily 3 "stand-alone" SMR domain- 
containing protein, YdaL, from Escherichia coli}'^ Interestingly, 
the Leishmania donovani mRNA cycling sequence-binding pro- 
tein (LdCSBP) containing a CCCH Zn-finger RNA-binding 
domain and a subfamily 2 SMR domain has been reported to 
possess RNA endonuclease activity.'" The SMR domain of 
LdCSBP alone exhibits both DNA and RNA endonuclease activ- 
ity, but the full-length protein shows only sequence-specific RNA 
cleavage activity.'*' 

Given these reported activities of SMR domain-containing 
proteins, it is tempting to speculate on possible functions of PPR- 
SMR proteins. One possibility would be that PPR-SMRs are fac- 
tors with a dual function in both DNA and RNA metabolism, 
whereby the PPR motifs confer RNA binding activity while the 
SMR domain confers DNA binding activity. This would be con- 
sistent with a role in transcription (e.g., pTAC2). Alternatively, 
by analogy to LdCSBP, the observation that a protein containing 
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RNA binding and SMR domains can act as an RNA endonu- 
clease raises the question of whether PPR-SMR proteins can act 
as sequence-specific RNA endonucleases, where the PPR motifs 
confer RNA sequence specificity and the SMR domain confers 
endonuclease activity. This possibihty would be consistent with a 
role in mRNA processing (e.g., SVR7 and ATP4). 

While these proposed functions require rigorous experimental 
verification, given that endonucleolytic activity has been reported 
for four SMR domain-containing proteins, a comparison of the 
SMR domains from these proteins with those from Arabidopsis 
PPR-SMRs was undertaken to determine if conserved residues are 
present (Fig. 3). From an examination of different SMR domains 
belonging to the different SMR subfamilies conserved motifs 
specific to each subfamily have been identified." Subfamilies 1 
and 3 have a characteristic HGXG centrally within the SMR 
domain. In contrast, subfamily 2 contains a TGXG motif at 
the same position. These motifs are perfectly conserved in those 
proteins known to have endonuclease activity (Fig. 3). For the 
Arabidopsis proteins five of the eight SMR domains linked to 
PPR motifs have a TGXG motif at this position. The three SMR 
domains that diverge from this motif are in pTAC2, EMB2217, 
and AT2G17033. Subfamily 2 SMR domains are also charac- 
terized by an LDXH motif toward the N terminus of the SMR 
domain. This is conserved in the subfamily 2 SMR domains 
already verified to confer DNA and RNA endonuclease activity 
(human B3BP protein and the Leishmania CSBP protein). For 
the Arabidopsis proteins, only two of the eight SMR domains 
linked to PPR motifs have LDXH at this position, namely GUNl 
and AT2G17033. Thus, the only Arabidopsis PPR-SMR contain- 
ing both conserved motifs is GUNl. 

Conclusions and Perspectives 

PPR proteins that contain a C-terminal SMR domain represent a 
small but enigmatic subset of the PPR protein family whose mem- 
bers in higher plants show diverse protein abundance and varied 
putative functions in organellar RNA metabolism. Phylogenetic 
analysis indicates that PPR-SMRs are confined to green plants 
and algae but that they are ancient proteins that have modestly 
diversified during angiosperm evolution. Despite their ancient 
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